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Batcher's  bitonic  sort  (cf.  Knuth,  v.  HI,  pp.  232  ff)  is  a  sortmg  network, 
capable  of  sortmg  n  inputs  in  0((log  n)^)  stages.  When  adapted  to  conventional 
computers,  it  gives  rise  to  an  algorithm  that  runs  in  time  0(n(log  n)^.  The 
method  can  also  be  adapted  to  ultracomputers  (Schwartz  [1979])  to  exploit  th^ 
high  degree  of  paralleUsm.  The  resulting  algorithm  will  take  time  0((log  N)^  for 
ultracomputers  of  "size"  N.  The  implicit  constant  factor  is  low,  so  that  even  for 
moderate  values  of  N  the  ultracomputer  architecture  performs  faster  than  the 
0(N  log  N)  time  conventional  architecture  can  achieve. 

The  purpose  of  this  note  is  to  describe  the  adapted  algorithm.  After  some 
preliminaries  a  first  version  of  the  algorithm  is  given  whose  correctness  is  easily 
shown.  Next,  this  algorithm  is  transformed  to  make  it  suitable  for  an  ultracom- 
puter. 

Definition  A  sequence  Sq,...,Sjj  ^  of  elements  from  a  totally  ordered  set  is 
bitonic  if  there  exist  i  and  j,  O^i^j^n-1,  such  that  either 

s.^  s.^,^...^s.  and  s.2:  s  ,  ,^...2:s    ,^So^S,2:...^S., 

1        i+l  J  J        j+l  n-l       0      1  i' 

or 

s.^s.^,s...^s  and  sss  ,  ,^...ss  ,ssn^s-^...ss.. 

I       1+1  J  J      j+i  n-l       0      1  1 

(If  the  sequence  is  made  into  a  cycle  by  connecting  the  rear  back  to  the  front,  this 
means  that  both  ways  of  going  from  s.  to  s  give  an  ordered  "run.")  Note  that  a 

sequence  of  length  ^  3  is  always  bitonic. 
Bitonic  sort  hinges  on  the  following. 
Lemma  1.    Let  SQ,...,S2^^^be  bitonic.  For  i  =  0,...,n-l,  interchange  Sj  and  s^^^  if 
Sjj^j  <  s..  Then  for  the  resulting  sequence,  both  Sq,...,Sjj  ^  and  ^^f-i^^a-l  ^^  bitonic. 
Moreover,  each  of  the  elements  Sq,..,,Sjj  ^  is  less  than  or  equal  to  each  of  the  elements 
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Proof:  Sec  Batcher  (1968)  or  Stone  (1971).  (The  proofs  given  are  rather  infor- 
mal. A  more  formal  proof  would  be  elementary  but  not  very  enlightening;  it 
would  proceed  by  distinguishing  a  number  of  cases.) 

The  elements  to  be  sorted  are  stored  in  an  array  a[0:N-l],  where  N=2^  for 
some  integer  D.  The  indices  of  the  array  will  often  be  written  as  bitstrings 
(binary  numbers)  b^jbj^2...bQ,  corresponding  to  the  integer  b^^2^'^  +  ...  +  h^^. 
The  notation  b^^  denotes  the  substring  b^^_j...bj^.  (Note  that  the  subscript 
runs  from  high  to  low;  in  order  to  minimize  confusion,  capital  letters  will  be  used 
for  such  subscripts.) 

Definition.  CI  stands  for  a  mapping  from  the  set  of  substrings  b^^  ^^°  ^^  ^^^  °^ 
order  relations  ^  and  ^,  satisfying  n(bpj.jj+j)  is  ^  and  ^(bj^L+iO)=?tri(bj^j^^^l). 
One  possible  solution  is  given  by 

n  (bj^J  is  :s  if  bj^  ©  bj^  J  ©  ...©  b^  =  0, 

n  (bjj.L)  is  s  if  bj^  ©  bj^.j©...©  bj^  =    1. 

The  symbol  ©  stands  for  the  "logical  sum"  or  "exclusive  or",  so  the  summation 
determines  the  parity  of  b^j^.  A  simpler  solution  is  given  by:  ^(b^L+iO)  ^^  ^' 
Il(bj^j^^jl)  is  ^.   (By  convention,  ^(bj^-n+i)  is  ^  in  either  case.) 

The  assertions  of  the  correctness  proof  will  use  three  predicates,  defined 
below.  Let  the  array  a  be  (conceptually)  divided  into  2^^  segments  of  2^  ele- 
ments each.  The  indices  of  the  elements  of  a  given  segment  are  precisely  those 
which  have  a  common  initial  bitstring  bj^^-p* 

Definition.   Ordered  (P)  stands  for: 

within  each  segment  the  elements  are  sorted  in  fl(bj^^.p)-order. 

Definition.   Bitonic  (P)  stands  for: 

each  segment  forms  a  bitonic  sequence. 

Let  now  each  segment  be  subdivided  into  2^"^  subsegments,  or  boxes,  of  2^ 
elements  each.  If  the  elements  of  a  segment  were  sorted  in  some  order,  each  ele- 
ment would  end  up  in  its  destination  box  according  to  that  order. 

Definition.   InJBoxes  (P,Q)  stands  for: 

within  each  segment  the  elements  are  (already)  in  their  destination  boxes 
according  to  n(bj^j.p)  -  order. 

Lemma  2.  //"O^  P  :S  D,  then 

(1)  In_Boxes  (P,P); 

(2)  //In^Boxcs  (P,0),  then  Ordered  (P) 

(3)  forF^l,  (T Ordered  (P-1),  then  Bitomc(P). 

Proof:  As  to  (a),  In_Boxes  (P,P)  means  that  the  boxes  coincide  with  the  seg- 
ments.   As  there  is  only  one  destination  box  per  segment,    each  element  of  a 
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segment  must  be  in  its  destination  box.  As  to  (b),  if  In_Boxes  (P,0),  the  boxes 
have  one  element.  So  if  within  a  segment  the  elements  are  in  their  destination 
box,  they  must  be  in  place  and  each  segment  is  sorted.  (Actually,  In_Boxes 
(P,0)  is  equivalent  to  Ordered  (P).)  As  to  (c),  if  Ordered  (P-1),  then  for  each 
segment  of  length  2^  the  lower  half  and  the  upper  half  are  both  sorted  in 
fl(bj^j.p_j)-  order.  For  the  lower  half  bp^  =  1,  so  the  upper  half  is  sorted  in  the 
reverse  order  of  the  order  of  the  lower  half.   The  whole  segment  is  then  bitonic. 

Definition.   ich(HJ»,Q),  Os  Q  ^  P  sH+l^D,  stands  for  the  following  action: 

for  all  b,  interchange  a[b  with  bQ=0]  and  a[b  with  bQ=l]  if  they  are  not  in 

fl(bj^p)-  order. 
Lemma  3.   //O^QsP^D,  then  -  > 

{Bitonic  (Q+ l)«Sdn.^xes(P,Q+ 1)}  ich(D-lJ»,Q)  {Bitonic(Q)&InJ3oxcs(P,Q)}, 

Proof:  This  lemma  is  a  generalization  of  Lemma  1  for  sequences  whose  length  is 
a  power  of  two.  (Lemma  1  is  obtained  from  Lemma  3  by  taking  P=D  and  Q=D 
-L)  The  generalization  follows  by  applying  Lemma  1  to  each  (bitonic)  box  of 
length  2^"*"^  in  a  segment  of  length  2^.  The  boxes  are  then  "refined"  by  splitting 
each  box  into  two  halves  (each  of  which  receives  again  a  bitonic  sequence),  and 
its  elements  are  divided  over  the  two  new  boxes  of  length  2^  according  to  n(D- 
1:P)-  order.  Since  the  elements  were  already  in  their  destination  boxes  of  length 
2^'^^,  they  now  reach  their  destination  box  of  length  2^. 
First  version  of  the  algorithm: 

{In_Boxes  (0,0) 
{Ordered  (0)} 
forP  =  1,2,..., D  do 

{Ordered  (P-1)} 
{Bitonic  (P)  &  Indexes  (P,P)} 
forQ  =  P-  1,  P.2,...,0do 

{Bitonic  (Q-Hl)  &  In_Boxes  (P,Q+1)} 
ich  (D-1  J>,Q) 

{Bitonic  (Q)  &  In^oxes  (P,Q)} 
end  for  Q 
{In_Boxes  (P,0)} 
{Ordered  (P)} 
end  for  P 
{Ordered  (D)}. 

Correctness  proof:  Each  of  the  verification  conditions  is  either  trivially  satisfied 
or  is  an  immediate  consequence  of  Lemmas  2  and  3.  The  final  assertion  Ordered 
(D)  asserts  that  the  whole  array  is  sorted  in  s-  order. 

If  the  operation   ich(D-lJ*,Q)  could  be  realized  in  time  0(1),  the  algorithm 
would  take  time  0(D^.   If  the  elements  of  the  array  a  are  stored  in  consecutive 
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processors  of  an  ultracomputer,  it  is,  however,  not  possible  to  compare  two  arbi- 
trary elements  immediately,  since  not  all  processors  are  directly  connected.  Con- 
secutive processors  are  connected,  so  operations  of  the  form  ich(HJ',0)  operate 
in  time  ©(1).  Other  connections  are  the  shuffle  lines,  connecting  each  processor 
b^^^  to  the  processor  o"(b^j^  =  bgh^^^.j.  Through  this  connection,  the  follow- 
ing/?ara//fi/  assignments  take  time  0(1): 

shuffle:  for  all  b,  a[b]  :=a[CT(b)]; 
unshuffle:  for  all  b,  a[CT(b)]  :=a[b]. 

The  two  operations  permute  a  and  are  each  other's  inverse. 

Let  shuffle^  stand  for  the  null  action  if  Q=  0,  and  for  shuffle  ^^;  shuffle  if 
Q  s  1.   So  shuffle^  stands  for: 

forallb,  a[b]  :  =a[aQ(b)]. 
Let  unshuffle  ^  be  defined  similarly. 

Lemma  4.  ich  (D-1J»,Q),  where  O^QsP^D,  is  equivalent  to 
^^^  unshuffle^;  ids  (D-Ol-P-Q.O);  shuffle^. 

Proof:  The  operation  ich(D-l  J',Q)  stands  for: 

for  all  b,  interchange  a[b  with  bQ=0]  and  a[b  with  bQ=l]  if  they  arc  not  in 

^(bD.i.p)-order. 

Using  the  assignment  rule,  this  is  seen  to  be  equivalent  to 

for  all  b,  a[CTQ(b)]  :=  a[b]  (or  unshuffle  Q); 
for  all  b,  interchange  a[a^(b)  with  bQO] 
and  a[(rQ(b)  with  bQ=l] 
if  they  are  not  in  n(b^j.p)  -  order; 
for  all  b,  a[b]  :=a[c7Q(b)]   (or  unshuffie  Q). 

Substituting  in  the  middle  part  CT'^(b')  for  b,  using  bj^=a''^(b')^=b'j^.Qfor  R,  we 
obtain 

for  all  b',  interchange  a[b'  with  b'g  =0] 
and  a(b'  with  b'^  =  1] 
if  they  are  not  in  n(bj^Q_j.p,Q)-order. 

This  is  exactly  the  meaning  of  ich(D-Q-l  J*-Q,0). 

Using  Lemma  4,  the  algorithm  may  be  transformed  to: 
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for  P  =  1,2,. ..,D  do 
for  Q  =  P-l,P-2,...,0  do 

unshuffle^; 

ich  (D-Q-1J»-Q,0); 

shuffle  Q 
end  for  Q 
end  for  P. 

This  intermediate  version  would  require  time  6(I>'). 
Lemma  5.   For  K^O 

LOOP  j^=  for  Q=K,K-1,...,0  do  unshuffle^;  S(Q);  shuffle^  end. 
where  S(Q)  is  any  statement  depending  on  Q,  is  equivalent  to 
unshuffle  ^^^^JDOFj^,  where 
LOOP'j^=  for  Q  =  K,K-1,...,0  do  shuffle;  S(Q)  end. 

Proof:  By  induction  on  K.    LOOPq  and  unshuffle;  LOOF^  reduce  to  an  obvious 
equivalence.   For  larger  K,  we  see  that  LOOPj^  is  equivalent  to 

unshuffle^;  S(K);  shuffle^;  LOOPj^.^ 
by  moving   the   first  execution  of  the  loop  body  outside.    By  the  inductive 
hypothesis,  this  is  equivalent  to 

unshuffle^;  S(K);  shuffle^;  unshuffle^;  LOOPj,.^ 
which  again  is  equivalent  to 

unshuffle^*  ^;  shuffle;  S(K);  LOOFj,.^. 
Moving  shuffle;  S(K)  inside  the  loop,  we  obtain 

unshuffle^^^;LOOFj,. 

By  this  lenmia,  we  Anally  obtain 
Algorithm  for  bitonic  sort  on  altracompaters 

for  P  =  l,2,...,Ddo 

unshuffle^; 

for  Q  =  P-l,P-2,...,0do 
shuffle;" 

ich  (D-Q-1  J>-Q,0) 
end  for  Q 
end  for  P. 

This  algorithm  dearly  takes  time  8(D^  =  6((log  N)^. 

Remark.    The  idea  of  using  shuffles  to  implement  bitonic  sort  is  described  in 
Stone  [1971]. 
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